Comparing visual features for lipreading
نویسندگان
چکیده
For automatic lipreading, there are many competing methods for feature extraction. Often, because of the complexity of the task these methods are tested on only quite restricted datasets, such as the letters of the alphabet or digits, and from only a few speakers. In this paper we compare some of the leading methods for lip feature extraction and compare them on the GRID dataset which uses a constrained vocabulary over, in this case, 15 speakers. Previously the GRID data has had restricted attention because of the requirements to track the face and lips accurately. We overcome this via the use of a novel linear predictor (LP) tracker which we use to control an Active Appearance Model (AAM). By ignoring shape and/or appearance parameters from the AAM we can quantify the effect of appearance and/or shape when lip-reading. We find that shape alone is a useful cue for lipreading (which is consistent with human experiments). However, the incremental effect of shape on appearance appears to be not significant which implies that the inner appearance of the mouth contains more information than the shape.
منابع مشابه
Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading
Speech is a multisensory percept, comprising an auditory and visual component. While the content and processing pathways of audio speech have been well characterized, the visual component is less well understood. In this work, we expand current methodologies using system identification to introduce a framework that facilitates the study of visual speech in its natural, continuous form. Specific...
متن کاملLipNet: End-to-End Sentence-level Lipreading
Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rat...
متن کاملLipNet: Sentence-level Lipreading
Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequenc...
متن کاملAn extended pose-invariant lipreading system
In recent work, we have concentrated on the problem of lipreading from non-frontal views (poses). In particular, we have focused on the use of profile views, and proposed two approaches for lipreading on basis of visual features extracted from such views: (a) Direct statistical modeling of the features, namely use of view-dependent statistical models; and (b) Normalization of such features by t...
متن کاملAn Image Transform Approach for HMM based Automatic Lipreading
This paper concentrates on the visual front end for hidden Markov model based automatic lipreading. Two approaches for extracting features relevant to lipreading, given image sequences of the speaker's mouth region, are considered: A lip contour based feature approach, which rst obtains estimates of the speaker's lip contours and subsequently extracts features from them, and an image transform ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009